Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 20 de 29
Filter
1.
Viruses ; 15(5)2023 04 25.
Article in English | MEDLINE | ID: covidwho-20235598

ABSTRACT

Drug appropriateness is a pillar of modern evidence-based medicine, but the turnaround times of genomic sequencing are not compatible with the urgent need to deliver treatments against microorganisms. Massive worldwide genomic surveillance has created an unprecedented landscape for exploiting viral sequencing for therapeutic purposes. When it comes to therapeutic antiviral antibodies, using IC50 against specific polymorphisms of the target antigen can be calculated in vitro, and a list of mutations leading to drug resistance (immune escape) can be compiled. The author encountered this type of knowledge (available from the Stanford University Coronavirus Antiviral Resistance Database,) in a publicly accessible repository of SARS-CoV-2 sequences. The author used a custom function of the CoV-Spectrum.org web portal to deliver up-to-date, regional prevalence estimates of baseline efficacy for each authorized anti-spike mAb across all co-circulating SARS-CoV-2 sublineages at a given time point. This publicly accessible tool can inform therapeutic choices that would otherwise be blind.


Subject(s)
COVID-19 , Humans , SARS-CoV-2/genetics , Genomics , Antibodies, Monoclonal/therapeutic use , Antibodies, Viral/therapeutic use , Antiviral Agents , Spike Glycoprotein, Coronavirus/genetics , Antibodies, Neutralizing
2.
Gigascience ; 122022 12 28.
Article in English | MEDLINE | ID: covidwho-2313424

ABSTRACT

BACKGROUND: Since the beginning of the coronavirus disease 2019 pandemic, there has been an explosion of sequencing of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus, making it the most widely sequenced virus in the history. Several databases and tools have been created to keep track of genome sequences and variants of the virus; most notably, the GISAID platform hosts millions of complete genome sequences, and it is continuously expanding every day. A challenging task is the development of fast and accurate tools that are able to distinguish between the different SARS-CoV-2 variants and assign them to a clade. RESULTS: In this article, we leverage the frequency chaos game representation (FCGR) and convolutional neural networks (CNNs) to develop an original method that learns how to classify genome sequences that we implement into CouGaR-g, a tool for the clade assignment problem on SARS-CoV-2 sequences. On a testing subset of the GISAID, CouGaR-g achieved an $96.29\%$ overall accuracy, while a similar tool, Covidex, obtained a $77,12\%$ overall accuracy. As far as we know, our method is the first using deep learning and FCGR for intraspecies classification. Furthermore, by using some feature importance methods, CouGaR-g allows to identify k-mers that match SARS-CoV-2 marker variants. CONCLUSIONS: By combining FCGR and CNNs, we develop a method that achieves a better accuracy than Covidex (which is based on random forest) for clade assignment of SARS-CoV-2 genome sequences, also thanks to our training on a much larger dataset, with comparable running times. Our method implemented in CouGaR-g is able to detect k-mers that capture relevant biological information that distinguishes the clades, known as marker variants. AVAILABILITY: The trained models can be tested online providing a FASTA file (with 1 or multiple sequences) at https://huggingface.co/spaces/BIASLab/sars-cov-2-classification-fcgr. CouGaR-g is also available at https://github.com/AlgoLab/CouGaR-g under the GPL.


Subject(s)
COVID-19 , Deep Learning , Puma , Animals , SARS-CoV-2/genetics , Puma/genetics , Genome, Viral
3.
Vaccines (Basel) ; 11(4)2023 Mar 27.
Article in English | MEDLINE | ID: covidwho-2296235

ABSTRACT

Coronaviruses belong to the group of RNA family of viruses that trigger diseases in birds, humans, and mammals, which can cause respiratory tract infections. The COVID-19 pandemic has badly affected every part of the world. Our study aimed to explore the genome of SARS-CoV-2, followed by in silico analysis of its proteins. Different nucleotide and protein variants of SARS-CoV-2 were retrieved from NCBI. Contigs and consensus sequences were developed to identify these variants using SnapGene. Data of the variants that significantly differed from each other was run through Predict Protein software to understand the changes produced in the protein structure. The SOPMA web server was used to predict the secondary structure of the proteins. Tertiary structure details of the selected proteins were analyzed using the web server SWISS-MODEL. Sequencing results showed numerous single nucleotide polymorphisms in the surface glycoprotein, nucleocapsid, ORF1a, and ORF1ab polyprotein while the envelope, membrane, ORF3a, ORF6, ORF7a, ORF8, and ORF10 genes had no or few SNPs. Contigs were used to identify variations in the Alpha and Delta variants of SARS-CoV-2 with the reference strain (Wuhan). Some of the secondary structures of the SARS-CoV-2 proteins were predicted by using Sopma software and were further compared with reference strains of SARS-CoV-2 (Wuhan) proteins. The tertiary structure details of only spike proteins were analyzed through the SWISS-MODEL and Ramachandran plots. Through the Swiss-model, a comparison of the tertiary structure model of the SARS-CoV-2 spike protein of the Alpha and Delta variants was made with the reference strain (Wuhan). Alpha and Delta variants of the SARS-CoV-2 isolates submitted in GISAID from Pakistan with changes in structural and nonstructural proteins were compared with the reference strain, and 3D structure mapping of the spike glycoprotein and mutations in the amino acids were seen. The surprisingly increased rate of SARS-CoV-2 transmission has forced numerous countries to impose a total lockdown due to an unusual occurrence. In this research, we employed in silico computational tools to analyze the SARS-CoV-2 genomes worldwide to detect vital variations in structural proteins and dynamic changes in all SARS-CoV-2 proteins, mainly spike proteins, produced due to many mutations. Our analysis revealed substantial differences in the functionality, immunological, physicochemical, and structural variations in the SARS-CoV-2 isolates. However, the real impact of these SNPs can only be determined further by experiments. Our results can aid in vivo and in vitro experiments in the future.

4.
EBioMedicine ; 91: 104534, 2023 May.
Article in English | MEDLINE | ID: covidwho-2251640

ABSTRACT

BACKGROUND: Throughout the COVID-19 pandemic, the SARS-CoV-2 virus has continued to evolve, with new variants outcompeting existing variants and often leading to different dynamics of disease spread. METHODS: In this paper, we performed a retrospective analysis using longitudinal sequencing data to characterize differences in the speed, calendar timing, and magnitude of 16 SARS-CoV-2 variant waves/transitions for 230 countries and sub-country regions, between October 2020 and January 2023. We then clustered geographic locations in terms of their variant behavior across several Omicron variants, allowing us to identify groups of locations exhibiting similar variant transitions. Finally, we explored relationships between heterogeneity in these variant waves and time-varying factors, including vaccination status of the population, governmental policy, and the number of variants in simultaneous competition. FINDINGS: This work demonstrates associations between the behavior of an emerging variant and the number of co-circulating variants as well as the demographic context of the population. We also observed an association between high vaccination rates and variant transition dynamics prior to the Mu and Delta variant transitions. INTERPRETATION: These results suggest the behavior of an emergent variant may be sensitive to the immunologic and demographic context of its location. Additionally, this work represents the most comprehensive characterization of variant transitions globally to date. FUNDING: Laboratory Directed Research and Development (LDRD), Los Alamos National Laboratory.


Subject(s)
COVID-19 , SARS-CoV-2 , Humans , SARS-CoV-2/genetics , COVID-19/epidemiology , COVID-19/prevention & control , Pandemics , Retrospective Studies
5.
Tohoku J Exp Med ; 260(1): 21-27, 2023 May 09.
Article in English | MEDLINE | ID: covidwho-2248420

ABSTRACT

The genomes of sarbecoviruses, including severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), incorporate mutations with short sequence exchanges based on unknown processes. Currently, the presence of such short-sequence exchanges among the genomes of different SARS-CoV-2 lineages remains uncertain. In the present study, multiple SARS-CoV-2 genome sequences from different clades or sublineages were collected from an international mass sequence database and compared to identify the presence of short sequence exchanges. Initial screening with multiple sequence alignments identified two locations with trinucleotide substitutions, both in the nucleocapsid (N) gene. The first exchange from 5'-GAT-3' to 5'-CTA-3' at nucleotide positions 28,280-28,282 resulted in a change in the amino acid from aspartic acid (D) to leucine (L), which was predominant in clade GRY (Alpha). The second exchange from 5'-GGG-3' to 5'-AAC-3' at nucleotide positions 28,881-28,883 resulted in an amino acid change from arginine and glycine (RG) to lysine and arginine (KR), which was predominant in GR (Gamma), GRY (Alpha), and GRA (Omicron). Both trinucleotide substitutions occurred before June 2020. The sequence identity rate between these lineages suggests that coincidental succession of single-nucleotide substitutions is unlikely. Basic local alignment search tool sequence search revealed the absence of intermediating mutations based on single-base substitutions or overlapping indels before the emergence of these trinucleotide substitutions. These findings suggest that trinucleotide substitutions could have developed via an en bloc exchange. In summary, trinucleotide substitutions at two locations in the SARS-CoV-2 N gene were identified. This mutation may provide insights into the evolution of SARS-CoV-2.


Subject(s)
COVID-19 , SARS-CoV-2 , Humans , SARS-CoV-2/genetics , COVID-19/genetics , Mutation/genetics , Nucleocapsid/genetics , Nucleotides , Amino Acids/genetics , Phylogeny
6.
Genes (Basel) ; 14(1)2023 Jan 09.
Article in English | MEDLINE | ID: covidwho-2243028

ABSTRACT

Omicron variants have been classified as Variants of Concern (VOC) by the World Health Organization (WHO) ever since they first emerged as a result of a significant mutation in this variant, which showed to have an impact on transmissibility and virulence of the virus, as evidenced by the ongoing modifications in the SARS-CoV-2 virus. As a global pandemic, the Omicron variant also spread among the Kurdish population. This study aimed to analyze different strains from different cities of the Kurdistan region of Iraq to show the risk of infection and the impact of the various mutations on immune responses and vaccination. A total of 175 nasopharyngeal/oropharyngeal specimens were collected at West Erbil Emergency Hospital and confirmed for SARS-CoV-2 infection by RT-PCR. The genomes of the samples were sequenced using the Illumina COVID-Seq Method. The genome analysis was established based on previously published data in the GISAID database and compared to previously detected mutations in the Omicron variants, and that they belong to the BA.1 lineage and include most variations determined in other studies related to transmissibility, high infectivity and immune escape. Most of the mutations were found in the RBD (receptor binding domain), the region related to the escape from humoral immunity. Remarkably, these point mutations (G339D, S371L, S373P, S375F, T547K, D614G, H655Y, N679K and N969K) were also determined in this study, which were unique, and their impact should be addressed more. Overall, the Omicron variants were more contagious than other variants. However, the mortality rate was low, and most infectious cases were asymptomatic. The next step should address the potential of Omicron variants to develop the next-generation COVID-19 vaccine.


Subject(s)
COVID-19 , SARS-CoV-2 , Humans , SARS-CoV-2/genetics , Phylogeny , Iraq/epidemiology , COVID-19 Vaccines , COVID-19/epidemiology , COVID-19/genetics , Genomics
7.
Front Microbiol ; 13: 1089399, 2022.
Article in English | MEDLINE | ID: covidwho-2238422

ABSTRACT

Introduction: The world is still struggling against the pandemic of coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), in 2022. The pandemic has been facilitated by the intermittent emergence of variant strains, which has been explained and classified mainly by the patterns of point mutations of the spike (S) gene. However, the profiles of insertions/deletions (indels) in SARS-CoV-2 genomes during the pandemic remain largely unevaluated yet. Methods: In this study, we first screened for the genome regions of polymorphic indel sites by performing multiple sequence alignment; then, NCBI BLAST search and GISAID database search were performed to comprehensively investigate the indel profiles at the polymorphic indel hotspot and elucidate the emergence and spread of the indels in time and geographical distribution. Results: A polymorphic indel hotspot was identified in the N-terminal domain of the S gene at approximately 22,200 nucleotide position, corresponding to 210-215 amino acid positions of SARS-CoV-2 S protein. This polymorphic hotspot was comprised of adjacent 3-base deletion (5'-ATT-3'; Spike_N211del) and 9-base insertion (5'-AGCCAGAAG-3'; Spike_ins214EPE). By performing NCBI BLAST search and GISAID database search, we identified several types of tandem repeats of the 9-base insertion, creating an 18-base insertion (Spike_ins214EPEEPE, Spike_ins214EPDEPE). The results of the searches suggested that the two-cycle tandem repeats of the 9-base insertion were created in November 2021 in Central Europe, whereas the emergence of the original one-cycle 9-base insertion (Spike_ins214EPE) would date back to the middle of 2020 and was away from the Central Europe. The identified 18-base insertions based on 2-cycle tandem repeat of the 9-base insertion were collected between November 2021 and April 2022, suggesting that these mutations could not survive and have been already eliminated. Discussion: The GISAID database search implied that this polymorphic indel hotspot to be with one of the highest tolerability for incorporating indels in SARS-CoV-2 S gene. In summary, the present study identified a variable number of tandem repeat of 9-base insertion in the N-terminal domain of SARS-CoV-2 S gene, and the repeat could have occurred at different time from the insertion of the original 9-base insertion.

8.
PNAS Nexus ; 1(4): pgac197, 2022 Sep.
Article in English | MEDLINE | ID: covidwho-2222708

ABSTRACT

Mutations in nonstructural protein 3 (nsp3) and nsp4 of SARS-CoV-2, presumably induced by the asthma drug ciclesonide (which also has anti-SARS-CoV-2 activity), were counted 5,851 cases in the GISAID EpiCoV genome database. Sporadic occurrence of mutants not linked to each other in the phylogenetic tree were identified at least 88 times; of which, 58 had one or more descendants in the same branch. Five of these had spread to more than 100 cases, and one had expanded to 4,748 cases, suggesting the mutations are frequent, selected in individual patients, and transmitted to form clusters of cases. Clinical trials of ciclesonide as a treatment for COVID-19 are the presumed cause of the frequent occurrence of mutations between 2020 June and 2021 November. In addition, because ciclesonide is a common treatment for asthma, it can drive mutations in asthmatics suffering from COVID-19. Ciclesonide-resistant mutations, which have unpredictable effects in humans, are likely to continue to emerge because SARS-CoV-2 remains prevalent globally.

9.
Genomics ; 114(6): 110497, 2022 Sep 28.
Article in English | MEDLINE | ID: covidwho-2050084

ABSTRACT

The goal of this study was to identify the genomic variants and determine molecular epidemiology of SARS-CoV-2 virus during the early pandemic stage in Bangladesh. Viral RNA was extracted, converted to cDNA, and amplified using Ion AmpliSeq™ SARS-CoV-2 Research Panel. 413 unique mutants from 151 viral isolates were identified. 80% of cases belongs to 8 mutants: 241C toT, 1163A toT, 3037C toT, 14408C toT, 23403A toG, 28881G toA, 28,882 G toA, and 28883G toC. Observed dominance of GR clade variants that have strong presence in Europe, suggesting European channel a possible entry route. Among 37 genomic mutants significantly associated with clinical symptoms, 3916CtoT (associated with sore-throat), 14408C to T (associated with cough-protection), 28881G to A, 28882G to A, and 28883G to C (associated with chest pain) were notable. These findings may inform future research platforms for disease management and epidemiological study.

10.
Microorganisms ; 10(7)2022 Jul 05.
Article in English | MEDLINE | ID: covidwho-1917629

ABSTRACT

Here, we report the emergence of the variant lineage B.1.1.523 that contains a set of mutations including 156_158del, E484K and S494P in the spike protein. E484K and S494P are known to significantly reduce SARS-CoV-2 neutralization by convalescent and vaccinated sera and are considered as mutations of concern. Lineage B.1.1.523 presumably originated in the Russian Federation and spread across European countries with the peak of transmission in April-May 2021. The B.1.1.523 lineage has now been reported from 31 countries. In this article, we analyze the possible origin of this mutation subset and its immune response using in silico methods.

11.
Patterns (N Y) ; 3(9): 100562, 2022 Sep 09.
Article in English | MEDLINE | ID: covidwho-1914886

ABSTRACT

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome data are essential for epidemiology, vaccine development, and tracking emerging variants. Millions of SARS-CoV-2 genomes have been sequenced during the pandemic. However, downloading SARS-CoV-2 genomes from databases is slow and unreliable, largely due to suboptimal choice of compression method. We evaluated the available compressors and found that Nucleotide Archival Format (NAF) would provide a drastic improvement compared with current methods. For Global Initiative on Sharing Avian Flu Data's (GISAID) pre-compressed datasets, NAF would increase efficiency 52.2 times for gzip-compressed data and 3.7 times for xz-compressed data. For DNA DataBank of Japan (DDBJ), NAF would improve throughput 40 times for gzip-compressed data. For GenBank and European Nucleotide Archive (ENA), NAF would accelerate data distribution by a factor of 29.3 times compared with uncompressed FASTA. This article provides a tutorial for installing and using NAF. Offering a NAF download option in sequence databases would provide a significant saving of time, bandwidth, and disk space and accelerate biological and medical research worldwide.

13.
Front Public Health ; 10: 887955, 2022.
Article in English | MEDLINE | ID: covidwho-1869435
14.
Virus Res ; 317: 198824, 2022 08.
Article in English | MEDLINE | ID: covidwho-1852224

ABSTRACT

The COVID-19 pandemic continues to pose a global health concern, despite the ongoing vaccination campaigns, due to the emergence and rapid spread of new variants of the causative agent SARS-CoV-2. These variants are identified and tracked via the marker mutations they carry, and the classification system put in place following tremendous sequencing efforts. In this study, the genomes of 1,230 Lebanese SARS-CoV-2 strains collected throughout 2 years of the outbreak in Lebanon were analyzed, 115 of which sequenced within this project. Strains were classified into seven GISAID clades, the major one being GRY, and 36 Pango lineages, with three variants of concern identified: alpha, delta and omicron. A time course distribution of GISAID clades allowed the visualization of change throughout the two years of the Lebanese outbreak, in conjunction with major events and measures in the country. Subsequent phylogenetic analysis showed the clustering of strains belonging to the same clades. In addition, a mutational survey showed the presence of mutations in the structural, non-structural and accessory proteins. Twenty five (25) mutations were labeled as major, i.e. present in more than 30% of the strains, such as the common Spike_D614G and NSP3_T183I. Whereas 635 were labeled as uncommon, i.e. found in very few of the analyzed strains as well as GISAID records, such as NSP2_I349V. Distribution of these mutations differed between 2020, and the first and the second half of 2021. In summary, this study highlights key genomic aspects of the Lebanese SARS-CoV-2 strains collected in 2020, the first year of the outbreak in Lebanon, versus those collected in 2021, the second year of COVID-19 in Lebanon.


Subject(s)
COVID-19 , SARS-CoV-2 , COVID-19/epidemiology , Genomics , Humans , Mutation , Pandemics , Phylogeny , SARS-CoV-2/genetics , Spike Glycoprotein, Coronavirus/genetics
15.
Viruses ; 14(4)2022 04 08.
Article in English | MEDLINE | ID: covidwho-1786076

ABSTRACT

Whole-genome sequencing (WGS) has played a significant role in understanding the epidemiology and biology of SARS-CoV-2 virus. Here, we investigate the use of SARS-CoV-2 WGS in Southeast and East Asian countries as a genomic surveillance during the COVID-19 pandemic. Nottingham-Indonesia Collaboration for Clinical Research and Training (NICCRAT) initiative has facilitated collaboration between the University of Nottingham and a team in the Research Center for Biotechnology, National Research and Innovation Agency (BRIN), to carry out a small number of SARS-CoV-2 WGS in Indonesia using Oxford Nanopore Technology (ONT). Analyses of SARS- CoV-2 genomes deposited on GISAID reveal the importance of clinical and demographic metadata collection and the importance of open access and data sharing. Lineage and phylogenetic analyses of two periods defined by the Delta variant outbreak reveal that: (1) B.1.466.2 variants were the most predominant in Indonesia before the Delta variant outbreak, having a unique spike gene mutation N439K at more than 98% frequency, (2) Delta variants AY.23 sub-lineage took over after June 2021, and (3) the highest rate of virus transmissions between Indonesia and other countries was through interactions with Singapore and Japan, two neighbouring countries with a high degree of access and travels to and from Indonesia.


Subject(s)
COVID-19 , SARS-CoV-2 , COVID-19/epidemiology , Humans , Indonesia/epidemiology , Mutation , Pandemics , Phylogeny , SARS-CoV-2/genetics
16.
Front Genet ; 13: 801332, 2022.
Article in English | MEDLINE | ID: covidwho-1686466

ABSTRACT

Early detection of Severe Acute Respiratory Syndrome Corona Virus 2 (SARS-CoV-2) variants and use of data for public health action requires a coordinated, rapid, and high throughput approach to whole genome sequencing (WGS). Currently, WGS output from many low- and middle-income countries (LMIC) has lagged. By fostering diverse partnerships and multiple sequencing technologies, Indonesia accelerated SARS-CoV-2 WGS uploads to GISAID from 1,210 in April 2021 to 5,791 in August 2021, an increase from 11 submissions per day between January to May, to 43 per day between June to August. Turn-around-time from specimen collection to submission decreased from 77 to 5 days, allowing for timely public health decisions. These changes were enabled by establishment of the National Genomic Surveillance Consortium, coordination between public and private sector laboratories with WGS capability, and diversification of sequencing platform technologies. Here we present how diversification on multiple levels enabled a rapid and significant increase of national WGS performance, with potentially valuable lessons for other LMICs.

18.
Infect Prev Pract ; 3(4): 100190, 2021 Dec.
Article in English | MEDLINE | ID: covidwho-1531487

ABSTRACT

BACKGROUND: A characteristic feature of SARS-CoV-2 is its ability to transmit from pre- or asymptomatic patients, complicating the tracing of infection pathways and causing outbreaks. Despite several reports that whole genome sequencing (WGS) and haplotype networks are useful for epidemiologic analysis, little is known about their use in nosocomial infections. AIM: We aimed to demonstrate the advantages of genetic epidemiology in identifying the link in nosocomial infection by comparing single nucleotide variations (SNVs) of isolates from patients associated with an outbreak in Showa University Hospital. METHODS: We used specimens from 32 patients in whom COVID-19 had been diagnosed using clinical reverse transcription-polymerase chain reaction tests. RNA of SARS-CoV-2 from specimens was reverse-transcribed and analysed using WGS. SNVs were extracted and used for lineage determination, phylogenetic tree analysis, and median-joining analysis. FINDINGS: The lineage of SARS-CoV-2 that was associated with outbreak in Showa University Hospital was B.1.1.214, which was consistent with that found in the Kanto metropolitan area during the same period. Consistent with canonical epidemiological observations, haplotype network analysis was successful for the classification of patients. Additionally, phylogenetic tree analysis revealed three independent introductions of the virus into the hospital during the outbreak. Further, median-joining analysis indicated that four patients were directly infected by any of the others in the same cluster. CONCLUSION: Genetic epidemiology with WGS and haplotype networks is useful for tracing transmission and optimizing prevention strategies in nosocomial outbreaks.

19.
Gene Rep ; 26: 101420, 2022 Mar.
Article in English | MEDLINE | ID: covidwho-1499885

ABSTRACT

The ongoing pandemic of COVID-19 caused by the SARS-COV2 virus has triggered millions of deaths around the globe. Emerging several variants of the virus with increased transmissibility, the severity of disease, and the ability of the virus to escape from the immune system has a cause for concerns. Here, we compared the spike protein sequence of 91 human SARS CoV2 strains of Iraq to the first reported sequence of SARS-CoV2 isolate from Wuhan Hu-1/China. The strains were isolated between June 2020 and March 2021. Twenty-two distinct mutations were identified within the spike protein regions which were: L5F, L18F, T19R, S151T, G181A, A222V, A348S, L452 (Q or M), T478K, N501Y, A520S, A522V, A570D, S605A, D614G, Q675H, N679K, P681H, T716I, S982A, A1020S, D1118H. The most frequently mutations occurred at the D614G (87/91), followed by S982A (50/91), and A570D (48/91), respectively. In addition, a distinct shift was observed in the type of SARS-COV2 variants present in 2020 compared to 2021 isolates. In 2020, B.1.428.1 lineage was appeared to be a dominant variant (85%). However, the diversity of the variants increased in 2021, and the majority (73%) of the isolated were appeared to belong to B.1.1.7 lineage (VOC/alpha variants). To our knowledge, this is the first major genome analysis of SARS-CoV2 in Iraq. The data from this research could provide insights into SARS-CoV2 evolution, and can be potentially used to recognize the effective vaccine against the disease.

20.
Comput Biol Med ; 139: 104981, 2021 12.
Article in English | MEDLINE | ID: covidwho-1482518

ABSTRACT

BACKGROUND: The SARS-CoV-2 virus caused a worldwide pandemic - although none of its predecessors from the coronavirus family ever achieved such a scale. The key to understanding the global success of SARS-CoV-2 is hidden in its genome. MATERIALS AND METHODS: We retrieved data for 329,942 SARS-CoV-2 records uploaded to the GISAID database from the beginning of the pandemic until the January 8, 2021. A Python variant detection script was developed to process the data using pairwise2 from the BioPython library. Sequence alignments were performed for every gene separately (except ORF1ab, which was not studied). Genomes less than 26,000 nucleotides long were excluded from the research. Clustering was performed using HDBScan. RESULTS: Here, we addressed the genetic variability of SARS-CoV-2 using 329,942 samples. The analysis yielded 155 SNPs and deletions in more than 0.3% of the sequences. Clustering results suggested that a proportion of people (2.46%) was infected with a distinct subtype of the B.1.1.7 variant, which contained four to six additional mutations (G28881A, G28882A, G28883С, A23403G, A28095T, G25437T). Two clusters were formed by mutations in the samples uploaded predominantly by Denmark and Australia (1.48% and 2.51%, respectively). A correlation coefficient matrix detected 160 pairs of mutations (correlation coefficient greater than 0.7). We also addressed the completeness of the GISAID database, patient gender, and age. Finally, we found ORF6 and E to be the most conserved genes (96.15% and 94.66% of the sequences totally match the reference, respectively). Our results indicate multiple areas for further research in both SARS-CoV-2 studies and health science.


Subject(s)
COVID-19 , SARS-CoV-2 , Genome, Viral , Humans , Mutation , Phylogeny
SELECTION OF CITATIONS
SEARCH DETAIL